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ABSTRACT 

The literature on teaching evaluation has long 
recognized that it is siaply not possible now, or perhaps ever, to 
isolate froB aaong all the variables that are interacting the 
individual teacher's contribution to changes in the learner, many of 
which are complex, subtle, and may not be observable until much later 
in the student's life. Thus, other criteria, usually judgmental in 
nature, have formed the basis of efforts to evaluate teaching. A 
widely stressed admonition is that one should never rely solely on a 
single source of data, but should use several or all of these forms 
of judgment. The sources of first-hand data that have been most often 
suggested (and which are discussed in this document) are faculty 
self^evaluation, peer evaluations without visitation, and the 
student's evaluation of teaching. (Author/KE) 
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The decade of the '70s has brought to the academic 
community a renewed interest in improving teaching; 
and the evaluation of teaching, as one means to this 
goal, is receiving a tremendous surge of attention. Un- 
fortunately, this attention sometimes degenerates into 
acrimonious argument among faculty members about 
what procedures should be used or, on the other hand, 
results in an uncritical and uninformed plunge into col- 
lecting all sorts of opinions— nowadays, mostly student 
opinions. 

In the current rush to bring some degree of objectivity 
to the evaluation of teaching, excellent advice is being 
ignored.^ For those who would heed it, the place to start 
is with The Recognition ^nd Evaluation of Teach/ng,,b>/ 
Kenneth Eble,^ director of the Project to Improve Teach- 
ing, jointly sponsored by the Association of Annerican 
Colleges and Annerican Association of University Pro- 
fessors. But additional required reading includes 
McKeachie,* the AAUP Statement on the Evaluation of 
Teaching,^ and Miller.^ What is now needed, I believe, is 
a careful examination of some of the experience and 
findings available from serious efforts to evaluate teach- 
ing. Which procedures establish the grounds for mean- 
ingful and reliable decisions about teaching quality? 
What kinds of data contribute to sound assessment of 
the teaching of individual prof essors when promotion or 
tenure is at issue? At least as important are guidelines as 
to how evaluation can contribute to the goal of improv- 
ing college teaching. 



At the outset, we should sharply differentiate these 
two quite different purposes that evaluation serves. 
Scriven^ has called evaluation for improvement "forma- 
tive evaluation." Necessary to individual instructors on 
a continuous basis, its requirements differ markedly 
from those of "summative evaluation" (Scriven), which 
becomes relevant only periodically, perhaps every two 
or three years, as a significant contribution to academic 
deliberations about tenure and promotion. The crucial 
demand on summative evaluations is that they provide 
the basis for fair decisions. For unless teaching quality is 
rewarded in a way which is perceived by faculty as fair, 
there will be little motivation for formative evaluati6n or 
the improvement that it facilitates. 

What is the essential first step in establishing sound 
procedures for the summative evaluation of teaching? 
Many writers have emphasized that it should be a clear 
statement of the criteria by which teaching is judged, 
along with the specification of the weight to be 
accorded in academic decisions to the teaching per- 
formance of each faculty member. (Weights for the 
other traditional academic functions— scholarly contri- 
bution to one's discipline, academic or community ser- 
vice, or any nontraditional responsibilities should, of 
course, also be clearly specified.) The process by which 
these individualized statements can be effectively pre- 
pared has been so well laid out by others" that there is 
no need to repeat here the principles which govern it, 
the influences to which it should be sensitive, or the 
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crucial role of department chairpersons and deans in 
implementing it. 

The literature on teaching evaluation has long recog- 
nized that it is simply not possible now, or perhaps ever, 
to isolate from among all the variables which are inter- 
acting the individual teacher's contribution to changes 
in the learner, many of which are complex, subtle, and 
may not be observable until much later in the student's 
life. Thus, other criteria, usually judgmental in nature, 
have formed the basis of efforts to evaluate teaching. 
The desirability of using first-hand data in tenure and 
promotion decisions has been frequently "^ac- 
knowledged, and the sources of first-hand data that 
have been most often suggested are faculty self-evalua- 
tions, peer evaluations based on classroom visitation, 
peer evaluations without visitation, and students' 
evaluations of teaching. A widely stressed admonition is 
that or\e should never rely solely on a single source of 
data, but should use several or all of these forms of 
judgment. Let us examine separately each of the recom- 
mended sources of first-hand data. 



Classroom Visitations 

Classroom visitation by colleagues has been tried in a 
number of different forms. The general finding is that it 
does not provide a sound method of evaluating the 
teacher's in-class activities. A few classroom visits by 
one colleague cannot be expected to produce a reliable 
judgment. (The terms reliable and reliability are used 
simply to mean consistency among judgments, 
including their repeatability.) Even when the number of 
colleagues is increased to three, and each makes at least 
two visits, the reliability of resulting evaluations' is so 
low as to make them useless. 

Whether these ratings would ever attain the reliability 
of the pooled judgrhents of students in a class, who 
observe and experience teaching for an entire term, is a 
question which has not been studied. Knowledge of the 
conditions which produce sound ratings would lead us 
to believe that^they would not. A major problem is that 
the anonymity of the raters cannot be preserved. Even if 
three or four colleagues visit (not many faculty take 
kindly to the idea of having a team of colleagues present 
at each class meeting), and the ratings were low, the 
givers of the low ratings would then be known to the 
teacher, who would typically have to interact with these 
evaluators on a daily basis. It is little wonder, then, that 
where colleague, visitation has been tried, all ratings 
tend to be very high. In a study^® where 54 teachers 
were evaluated on the basis of classroom visitation (two 
visits by eech of three colleagues) 94% of all ratings 
were in the top two categories of a five point scale. 



Arthur Eastman" confirms this effect in his delightful 
article, "How Visitation Came to Carnegie-Mellon Uni- 
versity:" "Visitors were generous . . . most (teachers) 

were encouraged at the approval they received " 

Such a positive bias prevents the attainment of 
reliability, which depends in part on discerned differ- 
ences in performahce among individuals as against the 
perceived sameness of everyone. 

Scott Edwards'' has suggested possible reasons for 
this high positive bias. "What department member 
conducting a class visit, knowing that he who evaluates 
today is himself evaluated tomorrow, can fail to see the 
need of a discreet reciprocity? Even where such con- 
cerns are not present, as when an evaluator enjoys a 
sufficient protection of tenure and rank, the too close 
acquaintance of department members does not permit 
the placing of much confidence in their assessments of 
each other." 

Thus absence of adequate reliability renders this 
source of first-hand data useless for purposes of sum- 
mative evaluation. 

On the other hand, colleague classroom visitation can 
be a valuable source of suggestions for the improve- 
ment of teaching. A system of visitation, free from the 
responsibility to record a formal evaluation and engaged 
in by an entire department, can stimulate discussion and 
concern among faculty about their teaching, and may 
prove a powerful motivator for teachers to be better 
prepared for each class. While most of us may not mind 
being thought of by our peers as not very polished in 
our delivery or skilled in leading discussions, we cer- 
tainly do not want to be regarded as having given only 
superficial thought to the organization of our subject 
matter or to the current developments in our fields. A 
general sharing of observations and discussion of prob- 
lems, perhaps at weekly brown-bag lunches, could 
bring a healthy .openness to the traditional "conspiracy 
of silence"'^ about problems encountered in teaching. 



Self-Evaluation 

With respect to self-evaluation, the evidence again 
does not support the use of this source of first-hand 
data as a basis for decisions about teaching quality. 
Blackburn and Clark^^ collected separate evaluations of 
teaching effectiveness for 45 full-time faculty members 
from four different sources—students, administrators, 
faculty colleagues, and from the professors themselves. 
This study found that self-ratings showed near zero cor- 
relations with ratings made by each of the other sources 
of judgment. The investigators conclude that, "The 
professor lives with an erroneous perception of how 
others perceive and assess him." 
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Centra'® compared teacher self-evaluations with 
•hose made by students. "The results demonstrate a 
clear discrepancy between the way most teachers de- 
scribe their instruction and the way students describe it. 
Not surprisingly, most teachers . . . viewed themselves 
in more favorable terms, particularly on such matters as 
whether they stimulated student interest, the extent to 
which the course objectives were met, and whether the 
instructor seemed open to other viewpoints. Of course 
there were some teachers who viewed themselves very 
much as their students viewed them, and even a few 
had more negative perceptions. Nevertheless the 

majority saw themselves in rather glowing terms " 

Thus, self analysis cannot provide the kind of data 
needed for summative evaluation. 

As with classroom visitation, self-analysis, along with 
other sources of feedback, can contribute positively to 
formative evaluations. By comparison of perceptions 
from other sources with their own self- descriptions, 
faculty members can be alerted to examine further 
whatever discrepancies occur. McKeachie'^ has ppinted 
out that feedback that differs from our own perceptions 
or which adds new information is much more likely to 
be followed by change in behavior than is feedback that 
simply confirms what we already know. 



Pwr Evaluations (without class visitation) 

There remain two other sources of first-hand data, 
peer evaluations without visitation and student evalua- 
tions of teaching. What has not been sufficiently clari- 
fied in most writing on the evaluation of teaching is that 
both peer and student judgments are essential to sum- 
mative evaluation; one without the other will lead to un- 
fair decisions. Even when both of these forms of evalua- 
tion are earned out, urifair decisions can still result 
unless very careful thought is given to the role of each. 

It does very little good to obtain from fapulty peers a 
single global judgment of a colleague's teaching effec- 
tiveness. Some years ago, Edwin Guthrie' and I*' ex- 
amined, through factor analysis, the relation of 
colleague judgments of teaching to student ratings of 
teaching quality. Data were available for 121 faculty 
members, each of whom had been evaluated by his or 
her students and also by committees of six or seven 
faculty colleagues.'* The six items judged.by.students 
and eight iteriis judged by faculty were the following: 

1. Teaching effectiveness 
Evaluations Z Clear and understandable in ex- 
planations 

Students 3. Active personal interest in the pro- 
gress of the class 



4. Friendly and sympathetic manner 

5. Shows interest and enthusiasm in 
subject 

6. Gets studerits interested in subject 

7. Teaching effectiveness 

8. Contribution to field through re- 
search and publication 

Evaluations 9. Contribution to community or state 
by committees 10. Ability to cooperate with other 
of faculty members of department 

colleagues 11. Knowledge of subject 

12. General knowledge and range of in- 
terest 

13. Rate of professional growth 

14. Recognition by others in his or her 
profession 

The analysis revealed that these items clustered into 
three independent factors. The first involved all six 
items judged by students; the second consisted of 
faculty judgments on four items: 

8. Contribution through research and publication 

11. Knowledge of the field 

13. Rate of professional growth 

14. Recognition by others in his or her profession 

The third was measured by items: 

7. Teaching effectiveness (judged by colleagues) 

9. Contribution to community or state 

10. Ability to cooperate with other members of de- 
partment 

12. General knowledge and range of interest 

Guthrie callaJ the first factor the teacher's impact on 
students; the second, impact on one's profession; and 
the third he called impact on colleagues. The chief point 
of interest here is that teaching effectiveness as judged 
by colleagues measures something quite different from 
the group of items which involve impact on students. 
• Colleagues probably judge one another as teachers on 
the basis of things they can observe ("Readiness to 
work with others in the department in arranging sched- 
ules, examinations, and the mass of operational detail 
on which members of a department must agree"'* as 
well as on breadth of general interests), not on the basis 
of classroom activities. Shoben,^ another psychologist, 
has suggested an equally plausible interpretation of this 
third factor— that it represents a general likableness and 
"reputation as a good colleague. It suggests that none 
of us is likely to designate a likable guy as a poor instruc- 
tor unless contrary evidence arises to strike us across 
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the chops." If this is true, then how can peer judgments 
be used in the evaluation of teaching? 

Peer evaluations can be used only if we break down 
the global judgment of teaching quality into those char- 
acteristics which faculty do observe, and if we use cer- 
tain psychometric controls in obtaining them. Behind 
the need for both student and peer judgments lies two 
well established principles of psychological measure- 
ment. The first is that judgments of complex human 
performance cannot be valid unless they are based on 
adequate observations of the performance or charac- 
teristics to be rated. The second is that the rater must 
have appropriate background against which to compare 
and evaluate what is observed. Students directly 
observe what goes on in the classroom and can make 
judgments about certain aspects of teaching, particu- 
larly those relating to their own experience of it. They do 
not characteristically, however, have the background to 
judge other essential characteristics. On the other hand, 
faculty peers typically do not observe the in-dass teach- 
ing of their colleagues, nor are they capable of experi- 
encing it as do those without their knowledge of the 
field; but they do observe and have the background to 
judge characteristics of teachers which students 
cannot. 

One absolute essential of good teaching is the in- 
structor's knowledge of the subject being taught. 
Students, especially freshman and sophomores, are not 
in a position to make this judgment and should not be 
asked to do so. (They may judge whether or not the 
teacher could answer their questions or whether he or 
she presented material beyond the textbook. They may 
also judge whether they felt they had learned something 
new. But this must be distinguished from whether what 
they learned was superficial and out of date or repre- 
sented an in-depth knowledge of the subject.) It is ex- 
actly this point that is demonstrated by the widely 
quoted "Dr. Fox Experiment."^' In that study, a trained 
actor delivered a lecture on mathematical game theory 
to a group of medical educators. He presented incorrect 
information, cited non-existent references, and used 
neologisms as basic terms. When his audience rated the 
lecture, the great majority gave favorable responses to 
^ questionnaire items regarding its quality. We can be 
sure the ratings would have been quite different had the 
lecture been delivered to professors of mathematics. 
The essential point is that judgments about the ac- 
curacy, currentness, or sophistication of a teacher's 
knowledge can only be made by faculty peers conver- 
sant with the same field. 

All of us have heard statements which express senti- 
ments such as, "No one who isn't publishing can be a 
good teacher," or "Those who take all of the time 
necessary to prepare publishable materials do so at the 



expense of their students' welfare." The awareness of 
the importance of a teacher's knowledge to sound 
teaching has led to some very muddled thinking on this 
point. It has, I believe, been one of the reasons for the 
use of judgments of research quality not only as a cri- 
teriori for the scholarly achievement of faculty but as a 
criterion for their teaching effectiveness as well. Good 
teaching requires scholarship- the kind that keeps the 
instructor in immediate and thoughtful contact with de- 
velopments in his or her field and with the ideas and 
findings of other scholars. This may not necessarily be 
the kind of scholarship which results in publication. But 
many faculty members do not trust their judgments of a 
colleague's knowledge unless they can see something 
he or she has written, or unless they know that editors 
of respected journals have accepted and published his 
or her work. Peer judgments of a colleague's publica- 
tions are a perfectly legitimate criterion in the evaluation 
of the "professor as scholar," but their substitution as a 
criterion for the evaluation of the "professor as teacher" 
sirnply misses the mark. It is time for the academic com- 
munity to acknowledge that there are other ways of 
demonstrating currency and depth in one's field than by 
publishing, and time also for faculty to have the courage 
to trust their judgments about the substantive 
knowledge of colleagues with whom they interact on a 
daily basis. The active, on-going life of an intellectual 
community is filled with discussions of recent develop- 
ments in a field, consultations with others on problems 
and ideas, colloquia, meetings, attendance of lectures, 
etc.; one cannot help developing an informed opinion of 
a colleague's knowledge. 

In addition to evaluating a teacher's knovyledge, peer 
judgments are needed for the evaluation of at least 
.three other aspects of teaching. If an appropriately'^ 
selected group of colleagues" reviews such data as a 
teacher's course outlines, texts, syllabi, reading lists, 
and statements of objectives, then they can render a 
useful judgment of the quality of teaching materials. A 
judgment of this sort does nof^need to produce fine 
discriminations, but it can answer relevant questions 
like, "Are the materials current?" and "Do they reflect 
the best work in the field?" and "Are they appropriate 
to the course goals?" 

Some record of the performance of students should 
also be examined by a peer committee. What kinds of 
tests were used, and how did the students perform on 
them? Were they all true-false items, or were they more 
demanding of higher intellectual functions? Were 
papers written or projects carried out? What was their 
quality? What did the students learn? 

This fast question is important for any course, but it 
has particular significance for many elementary courses 
in which the content is prescribed as the foundations on 
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which more advanced courses must build. I will never 
forget my disbelief at hearing a young instructor in a 
beginning psychology course say, "My class wasn't 
interested in the neural basis of behavior or the princi- 
ples of sensation and perception, so we skipped those 
topics and discussed something they were interested in, 
the origins of sex-role identification." The origins of sex- 
role identification define a perfectly legitimate topic in 
psychology, usually covered in courses on develop- 
mental psychology or personality theory. One can even 
be sympathetic to the young instructor's desire to en- 
courage his students' interest in a psychological topic. 
But instead of arranging extra class sessions or informal 
meetings to pursue their interests, he chose to skip 
fundcmental course content. It is the department's re- 
sponsibility—not the students'— to see that a teacher 
does not sacrifice "hard topics" for more naturally ap- 
pealing ones, and this can be ascertained if peers ask, 
"What did the students learn?" 

Finally, there are aspects of teaching which do not 
bear directly on a faculty member's classroom activities, 
but which should.be evaluated and rewarded. These re- 
late to the assunnption of departmental responsibilities 
such as service on curriculum committees, supervision 
of graduate students who are learning to teach, the 
proposal of new courses, and even service on peer 
evaluation committees. 

If peer evaluations are obtained on additional teacher 
characteristics, two guiding principles are critical: (a) 
the judges must be able to observe, outside the class- 
room, what they are evaluating, and (b) they must have 
the background against which to compare what they 
observe. These are such fundamental requirements of 
valid judgment that their emphasis constitutes an em- 
barrassment. Nevertheless, they seem to be the ones 
most frequently and persistently violated. 

To achieve the validity of which peer judgments are 
capable, careful and systematic procedures, related to 
number and choice of judges, instructions, and method 
of obtaining judgments, are essential. While care must 
go into their planning, the procedures themselves are 
relatively simple and less cumbersome than might be 
imagined. 

As mentioned earlier, summative evaluations of 
teaching should be made only periodically— perhaps 
every two years for young untenured instructors, every 
three years for senior faculty. One reason for this timing 
is to allow enough teaching to occur to provide a repre^ 
sentative segment of a teacher's work. Evaluations 
should not rest on one course, or even on several 
courses taught in one term. There should be enough 
data to ascertain trends when a course is taught on 
several occasions or to see whether improvement is 
taking place over time. As much as possible, the evalua- 



tions should be staggered, so that about a third of the 
members of any given department are being evaluated 
each year. This arrangement makes feasible the use of 
rating controls necessary for reliability but that becomes 
burdensome if demanded lor all members of a depart- 
ment at the same time. In general, the process should 
guarantee the anonymity and independence of the 
rater. 

A look at the peer rating procedures used by Edwin 
Guthrie illustrates many of the principles which should 
be followed. Whenever a faculty member was a candi- 
date for promotion or tenure. Dean Guthrie requested 
him or her to nominate five colleagues who could serve 
as evaluators. They could be from the faculty member's 
own department or from a related department, but the 
essential requirement was that each evaluator be 
conversant with the field of the person to be judged. 
From the five, the Dean chose three and added three 
more of his own choice. Within this structure, he tried 
to insure that no rater was in competition for rank or 
salary with the person evaluated (thus, only tenured 
faculty served on committees) and that there were at 
least two members on the committee from outside the 
department of the candidate, but in a related field. The 
six" constituted a secret cpmmittee that never met. No 
member knew who the other members were nor did he 
know whether he had been nominated by the ratee or 
chosen by the Dean to serve on the committee. He was 
asked not to reveal his appointment to anyone, and 
instructed that this was a matter of academic integrity. 
Each rater was supplied with a set of materials that the 
ratee had provided to the Dean. The task of each rater 
was to arrive at a totally independent judgment on the 
specified characteristics and to write a general state- 
ment about the candidate. Each member returned his or 
her signed ratings directly to the Dean, and the six 
judgments were pooled for each characteristic rated. 

The principles underlying this set of procedures in- 
clude these: 

1. The person being evaluated had some choice, 
with the Dean, of his evaluators. 
* 2. Because more faculty were nominated than 
were chosen, the candidate could not be sure 
which of his or her nominees had been 
appointed and thus could not identify any indi- 
vidual as definitely on the committee. This pro- 
vided a measure of protection for the anonymity 
of the raters. 

3. The secret committee prevented one rater from 
trying to influence the others. No one could act 
as an advocate or an adversary. 

4. Each rater was forced to rely on his own judg- 
ments—not those of others. 

6 



5. The knowledge that the Dean, and only the 
Dean, saw the signed evaluations promoted a 
good deal of care on the part of the evaluators. 

6. The pooling of a set of independent judgments 
gave maximum reliability— better than a jointly 
agreed upon judgment, 

7. The extra-departmental .members acted as a 
corrective for occasional intradepartmental 
biases. 

Obviously, the Guthrian model is not the only way in 
which reliable peer judgments can be collected, but the 
principles illustrated enjoy considerable importance and 
are often overlooked. Another model in use requires 
that both the dean's and the candidate's choices come 
from an elected committee of the college faculty. An 
important consideration here is that such a committee 
be large enough to afford choice, and especially large 
enough that each candidate can nominate more com- 
mittee members than will be selected; otherwise 
anonymity cannot be preserved. Other illustrations that 
incorporate the essential safeguards could be described, 
but institutions vary so widely In size and organization 
: that no set of models will serve all. As Richard I. Miller^* , 
often emphasizes, only if a college "adapts; not adopts" 
will a particular system work within its structure. 

The practice of having evaluations arrived at in meet- 
ings of peer judges should be discouraged for two rea- 
sons: 1) it destroys the independence of judgments and 
2) it fails to protect the evaluation process from the 
subtle and complex interplay of social and 
psychological variables present in face-to-face meet- 
ings. Such a procedure is often followed under the 
belief that a gathering of the group facilitates informa- 
tion exchange, but this function can be accomplished in 
other ways.^ A covert advocacy or oppositional stance 
on the part of a peer can often be couched in what 
appears to be an unbiased and reasoned argument. 
Even seemingly objective committee discussions are not 
free of personality interactions based on friendship, 
charisma, or respect for another's status; nor do they 
prevent the interplay of factors such as a desire to 
please, a history of exchanged favors, or an unwilling- 
ness to speak up in the presence of stronger indiv[duals, 
who thereby "wield disproportionate influence."^ This 
is not to say, of course, that many faculty cannot 
maintain an unbiased position in the presence of these 
factors; but generally, open meetings do not provide the 
conditions that maximize objectivity of judgment on the 
part of all evaluators. When peer evaluators do not 
know who the other members of the evaluation com- 
mittee are, the effects of such variables either cannot 
operate or are held to a minimum. 



How many raters are needed? We know that a single 
rating is not, in general, reliable. Because pooling the 
ratings from as many as three judges substantially im- 
proves reliability, peer committees should probably have 
at least three members, more if possible. Another 
important consideration is that the system must have 
enough flexibility to be used in all departments. If a de- 
partment is so small that it has only two tenured faculty, 
then only one can be appointed and the other two 
judges must come from allied fields. The principle of 
anonymity of the rater may not be protected perfectly in 
such cases, but it should be guarded as carefully as is 
possible. There is abundant evidence" that ratings 
made without the protection of anonymity have neither 
the validity nor the reliability of ratings made with the 
guarantee that the rater will remain anonymous to the 
person being judged. Peer ratings based on the princi- 
ples outlined here can provide one source of usefully 
cogent data to be examined along with student ratings 
and with recommendations from departmental chair- 
persons. 

Even here, where the purpose is so clearly summative 
in nature, preparation by the ratee of materials for a 
faculty-peer committee may also serve formative eval- 
uation. Thus, the self-presentation process niay contri- 
bute to self-evaluation,^ especially if, in addition to as- 
sembling samples of syllabi, tests, graded papers, etc., 
the faculty member prepared an analytical paper on the 
development of each course taught, on his or her own 
development as a teacher, and on the changes made 
over the years in a particular course. Such self-analysis 
could become the starting point for efforts to improve. 



Student Evaluations of Teaching 

The characteristics of good teaching that colleagues 
can judge are essential ones but not sufficient, for they 
tell us nothing of what transpires within the classroom. 
Much published work has established the reliability and 
some types of validity of student evaluations of teach- 
ing.^ There is no doubt that if the best known proce- 
dures are^used, student judgments can provide ah ex- 
cellent source of first-hand data. How much faith can be 
placed in thesejudgments will depend on the quality of 
the instrument and of the procedures employed"" to 
collect them. As with anything else, that quality can 
range from sound and sophisticated to sloppy and inac- 
curate. Careful planning and discussion, the 
commitment of resources, and some expert advice must 
precede their use. 

Before any type of student opinion is obtained, two 
basic issues should be understood. The questions raised 
by the differing requirements of formative and summa- 
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tive evaluations contrast sharply with those posed by 
the second issue. The first focuses on consequences of 
distinctive evaluation purposes; the second concerns 
the role that an institution wishes its students to assume 
in the evaluative process. 

If the purpose of obtaining student judgments of 
teaching is wholly that of providing feedback as a basis 
for the individual professor to improve, and if the results 
are not to be used in any administrative decision, then 
the answers to a whole set of questions regarding pro- 
cedures are automatically determined. For example, the 
questionnaire items to which the students are asked to 
respond can be framed by the individual professor or by 
a group of faculty. The items may, but need not, go 
through elaborate processes of refinement. The ques- 
tionnaire can contain many items that are as detailed 
and specific as possible to the course taught, so as to 
give clues for improvement. Items dealing with the 
teacher's style, the text, the exams, and all aspects of 
the course are appropriate. The guiding principle in item 
selection is simply that the information might help a 
professor improve. 

Additionally, administration of a student question- 
naire can be left in the hands of the individual teacher 
and there is no necessity that the ratings be numerical. 
If the resources for obtaining quantified ratings are 
available the professor may obtain fairly precise infor- 
mation, but if these resources are not available, then 
qualitative evaluation can be used. And finally, only the 
faculty members involved should see the results. It is es- 
pecially important that results of formative evaluation 
NOT be given to administrators. Lacking the require- 
ments of summative evaluations, student judgments 
obtained solely for the teacher's improvement can lead 
to inaccurate comparative assessments of teaching 
quality. 

When student judgments are to be considered in 
summative evaluations, a wholly different set of proce- 
dures is dictated in order to insure the comparability, 
accuracy, and consistency of the results necessary to 
their use in the academic decision process. A standard 
questionnaire, one which has been carefully derived and 
subjected to considerable refinement, is necessary to 
provide comparability among professors. Only a very 
small number of items, covering the qualities common 
to all good teaching^^ should be used. Items dealing 
with teaching style should not be included, for in the 
hands of an administrator they provide a temptation to 
consider one teaching style better than another. There 
is abundant evidence^* that no one style, per se, pro- 
duces superior learning, but the style with which an in- 
dividual teacher can be most successful depends on a 
host of variables, including his or her own personality. 



the subject matter taught, the students' backgrounds, 
the goals of the course, and many others. ^ 

Quantified responses are necessary for comparability, 
as are norms against which the numerical ratings can be 
compared. Each college should determine the type of 
norms needed. This decision rests on discovering what 
variables, such as faculty rank, class size, and course 
level result in overall differences in student ratings. The 
widely quoted studies^ that imply that the same vari- 
ables are operating on all campuses misrepresent the 
evidence. Which variables contribute to differences 
depend on the evaluative instrument used and the 
nature of the students and faculty at a particular school. 

It goes without saying that for summative evaluation, 
the anonymity of the student raters must be 
guaranteed.' Just as important, standard procedures of 
administering the student questionnaire are required. 
Quantitative ratings can be influenced by the instruc- 
tions given regarding the ratings. Evidence here is pro- 
vided by an investigation^^ designed to determine 
"whether the individual administering an evaluation 
instrument has any significant effect on the^ results 
.This study, involving ten sections and 227 students 
in an introductory educational psychology course, 
found a significant difference (at .05 level) between 
whether the instructor or a neutral individual adminis- 
tered the student evaluation form. Higher ratings were 
achieved when the instructor administered the 
survey."^ Thus control of the presentation of the rating 
instrument and the instructions regarding its completion 
are essential and cannot be left in the hands of the indi- 
vidual instructor. 

Finally, there must be an equitable process, mutually 
agreed upon by faculty and administrators, by which 
the ratings are communicated to the department chair- 
person or dean. The policy used should take into 
account the realities of any educational institution. It is 
well known that on occasion there are variables beyond 
the control of the individual professor which adversly af- 
fect the quality of his or her teaching. An especially 
heavy work load may be assigned in a particular term 
making necessary class preparation impossible; the 
size of a particular class may not be appropriate to the 
skills of the teacher; a personal tragedy in the profes- 
sor's life may occur during a particular term; or some- 
one may have to fill in for a colleague on leave by teach- 
ing courses outside his or her special area of knowledge. 
Requiring that a sufficient sample of evaluations from 
all courses taught be submitted but permitting each in- 
structor some choice as to which ones are presented 
usually prevents unfairness in these matters. 

Theoretically, it is possible simultaneously to collect 
student judgments for formative and summative evalua- 
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tions, perhaps by using a two-part instrument. If such a 
dual attempt is made, however, then all of the procedur- 
al safeguards for summative evaluations must be ob- 
served. 

Addressing the second major issue-- namely the role 
of students in the summative process--helps clarify the 
nature of their judgments. Two general positions are 
currently prevalent. One of these regards the student as 
a reporter who observes and transmits information on 
what takes place within a class. Menges« expresses this 
#iew when he says, "I believe that the instructor and his 
faculty colleagues, rather than students are the proper 
interpreters and weighters of student observations." 
The alternate view holds that college students are fully 
capable of functioning as "evaluators" as long as they 
are asked to judge only those aspects of teaching for 
which they have the appropriate background to make 
comparisons. This position allows students to partici^ 
pate with faculty colleagues and administrators in the 
evaluative process. Its supporters believe that college 
students have experienced enough teaching to be able 
to say that one professor is "outstanding, better than 
most, or only fair in comparison to other teachers I have 
known" in his or hef efforts to promote understanding 
of the subject matter, or in stimulating or motivating 
more active intellectual efforts. 

Here again, these different views dictate different 
types of student rating instruments. One of the logical 
consequences of viewing the student as a reporter is 
that the items placed on the questionnaire are chosen 
because faculty believe they describe the qualities most 
important to good teaching. The items may be derived 
from fa(iulty discussion or from educational theory. 
They may even be subjected to rather elaborate 
methods of refinement, but their ultimate justification 
lies in their origin. The other consequence of this view 
of students is that they are provided only with descrip- 
tive-not evaluative-terms for registering their obser- 
vations. The response categories on these question- 
naires indicate frequency, amount, or agreement (e.g., 
rarely, sometimes, frequently; less than, about the same 
as, or more than in rnosX courses; or strongly agree, 
agree, . . . strongly disagree). 

Viewing students as "evaluators" entails selecting 
items for the questionnaire because they have been 
shown to carry weight in differentiating those teachers 
whom the students have judged as good from those 
they have evaluated as poor.* Additionally, students 
register their responses in evaluative terms; 
outstanding, excellent, better than most, competent, 
average, only fair, in need of serious improvement,' 
poor. 
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Many existing instruments contain items for both 
formative and summative purposes, some of which 
require the student to be a reporter, some an evaluator. 
These combinations have resulted in most cases from a 
lack of awareness of the issues, and they contribute to 
confusion in how the results can be used. 



Once these fundamental distinctions have been ad- 
dressed, a college can then seek expert help in planning 
and implementing a sound system of teaching evalua- 
tion. This is not a matter to be entered into lightly. Each 
institution must decide whether improving the quality of 
its teaching is one of its goals and whether that goal is 
worth the effort to achieve it, including the effort which 
must go into evaluation. And here lies an issue so cru- 
cial that it cannot be ignored-the problem of how a 
system of evaluation can be initiated and fostered. 
AJthough pressure to reward teaching merit occasional- 
ly comes from faculty, the impetus for instituting sound 
evaluation procedures cannot, in general, be expected 
to originate with them. The idea of systematic evalua- 
tion in an area of professional functioning for which 
most faculty received little or no formal, training, and 
precious little help or advice, is understandably threat- 
ening, and often engenders massive resistance. It is not 
surprising, then, that many professors prefer to be 
judged solely on their role as scholars, for which they 
have had long and arduous training. Nor do some 
faculty care to be judged in areas of performance which 
they know will not be rewarded. It is well to remember 
that within institutions of higher education, the visable 
•rewards of salary increase and promotion are primarily 
controlled by deans and department chairpersons. Thus 
it is that serious efforts to evaluate teaching, either by 
peers or students, come about largely through the 
leadership of informed administrators. Even where re- 
sources of expertise exist, these will have limited eiffect 
without the administrative support which gently guides 
a faculty through discussions of the issues basic to 
evaluating teaching. And only when those who make 
academic decisions value the teaching role, attend to its 
different levels of merit, and reward it fairly are sound 
evaluativo procedures sought. It is no acciderit that at 
each of the institutions nationally recognized as leaders 
in teaching evaluation, there are one or more academic 
officers who understand and stimulate these develop- 
mpnts._Where evaluation efforts have floundered or 
failed, it is often for lack of administrative support. 

Responsible concern for teaching quality goes 
beyond evaluation. Institutions that espouse this goal 
must provide resources for faculty development in the 
practice of this vital skill. The relatively recent recogni- 



ERIC 



8 



tion by colleges and universities of the nature of this re* 
sponsibillty underlies the currently en^erging concept of 
faculty developnnent, one principal entailment of which 
is direct assistance to professors who want to innprove 
their effectiveness as teachers. 
Attention to the quality of teaching will not solve all 



of the problems of academia. F.B. Morgan, Jr.^ is right 
when he says of evaluation/' ... it will not usher in the 
Kingdom!" But an understanding of basic issues may 
reduce some of the controversy surrounding the choice 
of procedures and improve efforts to reward teaching 
fairly. 
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